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Abstract 


Any result can be generated randomly and any random result is useless. Traditional methods 

define uncertainty as a measure of the dispersion around the true value and are based on the 
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1. Introduction 


Any result can be generated randomly and any random result is useless. Traditional methods (Bich e¢ a/., 2006) and 
(Grabe, 2005) define uncertainty as a measure of the dispersion around the true value and are based on the hypothesis 
that any divergence from uniformity is the result of a deterministic event. The problem with this approach is that even 
non-uniform distributions can be generated randomly and the probability of this event rises as the number of 
hypotheses tested increases. Consequently, there is a risk of considering a random and therefore non-repeatable 
hypothesis as deterministic. Indeed, it is believed that this way of acting is the cause of the high number of non- 
reproducible results (Munafo ef al., 2017) and (loannidis, 2005). Therefore, we believe that the probability of obtaining 
an equal or better result randomly is the true uncertainty of the statistical data, because it represents the probability 
that the data is useful and therefore the validity of any other analysis depends on this parameter. 


In addition, we will also address the problem of determining the correct method of calculating the probability of 
obtaining an equal or better result randomly. Regarding this topic, we will see that the fundamental point, in calculating 
this probability value, is to consider the statistical data dependent on all the other data generated by all the tested 
hypotheses. 
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Considering the statistical data as non-independent has fundamental implications in statistical analysis. Indeed, 
all our random actions are not only useless, but will increase the uncertainty of the statistical data. For this reason, in 
the following paper (Berdondini, 2021), we highlight the importance of acting consciously in statistics. 


Furthermore, the evaluation of the uncertainty of the statistical data will be possible only by knowing all the 
attempts made. In practice, the calculation of uncertainty is very difficult because not only we must consider all our 
attempts, but we must also consider the attempts made by every other person who is performing the same task as us. 
In this way, the uncertainty of our statistical data also depends on the actions performed by the people who are 
working our own analysis. Indeed, a group of people who belong to a research network all having the same reputation 
who all work on the same problem can be considered with one person who carries out all the attempts made. 
Consequently, the calculation of uncertainty becomes something relative that depends on the information we have. 


2. Definition of Uncertainty 


The aim of the definition of uncertainty of the statistical data that we are going to give is to determine a parameter that 
is linked to the repeatability of the result and that is universal and therefore, independent of the system in which we 
perform the statistical analysis. 


We define the uncertainty of the statistical data as the probability of obtaining an equal or better result randomly. 


This definition considers the statistical data as a forecast, so a forecast is repeatable only if the process that 
generated it is non-random. Consequently, the calculation of uncertainty involves determining the type of process 
that generated the result. We can distinguish cognitive processes from random processes by their statistical property 
of generating non-reproducible results in a random way. Indeed, by using the information on the system, on which we 
are performing a measurement, we can increase our probability of forecasting and this leads to a consequent decrease 
in the probability of obtaining the same result randomly. 


It is interesting to note that the repeatability of the statistical data and non-randomness of the process that 
produced it are two equivalent concepts. Indeed, the information leads to the repeatability of the result and at the 
same time generates results that cannot be reproduced randomly. 


To understand the definition given, we report the following example: We have to analyze a statistical datum 
represented by 1000 predictions on an event that can have only two results. The 1000 predictions are divided into 600 
successes and 400 failures. To calculate the probability of obtaining an equal or better result in a random way, we use 
the binomial distribution and we obtain the following value. 


Now, instead, let us consider a statistical datum represented by 10 predictions divided into 8 successes and 2 
failures. In this case, the probability of getting an equal or better result randomly is 5.5%. 


Comparing the two results, we note that in the first case, although the number of successes is only 60%, the 
uncertainty is almost zero, while in the second case, with a probability of success of 80%, the uncertainty is much 
higher. This difference is due to the fact that the definition given, as mentioned, concerns only the repeatability of the 
result and not its accuracy. Therefore, it is a value that decreases as the repetition of the result increases. The 
approach presented is very different from the classic approach, where uncertainty is seen as a measure of the 
dispersion of the data with respect to the true value. 


The fundamental point to understand is that the probability that statistical data is completely random and the 
estimate of its random component (dispersion around the true value) are two parameters that are only partially 
dependent on each other. The first decreases as the number of repetitions of the measurement increases, the second 
does not and this is one of the reasons, why the traditional definition of uncertainty, in many cases, is not significant 
with regard to the repeatability of the result. 


The problem, as we have seen in the examples, is that there is always a greater or lesser probability that a purely 
random process generates the result. In this case, any analysis turns out to be wrong, for this reason, this value is 
considered the true uncertainty of the statistical result. 


3. Calculation of the Uncertainty of the Statistical Data 


Correctly calculating the probability of getting an equal or better result randomly involves changing our approach to 
statistics. The approach commonly used in statistics is to consider the data produced by one method independent of 
the data produced by different methods. This way of proceeding seems the only possible one but, as we will show in 
the following paradox, it leads to an illogical result, which is instead solved by considering the data as non-independent. 
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We think to have a computer with enormous computational capacity that is used to develop hypotheses about a 
phenomenon that we want to study. The computer works as follows: it creates a random hypothesis and then 
performs a statistical test. At this point, we ask ourselves the following question: can there be a useful statistical test 
to evaluate the results of the hypothesis generated? 


If we answer yes, we get an illogical result because our computer would always be able, by generating a large 
number of random hypotheses, to find a hypothesis that passes the statistical test. In this way, we arrive at the absurd 
conclusion that it is possible to create knowledge randomly, because it is enough to have a very powerful computer 
and a statistical test to understand every phenomenon. 


If we answer no, we get another illogical result because we are saying that no hypothesis can be evaluated. In 
practice, the results of different hypotheses are all equivalent and indistinguishable. 


How can we solve this logical paradox? The only way to answer the question, without obtaining an illogical 
situation, is to consider the results obtained from different methods depending on each other. A function that meets 
this condition is the probability of getting an equal or better result at random. Indeed, the calculation of this probability 
implies the random simulation of all the actions performed. Hence, random attempts increase the number of actions 
performed and consequently increase the probability of obtaining an equal or better result randomly. For this reason, 
generating random hypotheses is useless, and therefore if you use this parameter, as a measure of uncertainty, it is 
possible to evaluate the data and at the same time it is impossible to create knowledge by generating random 
hypotheses. 


Considering the statistical data as non-independent is a fundamental condition for correctly calculating the 
uncertainty. The probability of getting an equal or better result at random meets this condition. 


The dependence of statistical data on each other has profound implications in statistics, which will be discussed 
in the next section. 


4. Consequences of the Non-Independence of the Statistical Data 


Considering the statistical data dependent on each other in the calculation of uncertainty leads to three fundamental 
consequences in statistics. 


First Fundamental Consequence of the Non-independence of the Statistical Data: Our every random action always 
involves an increase in the uncertainty of the statistical data. 


Example: We need to analyze a statistical datum represented by 10 predictions about an event that can only have two 
results. The 10 predictions are divided into 8 successes and 2 failures. To calculate the probability of obtaining an 
equal or better result randomly we use the binomial distribution and we get the following value 5.5%. If before making 
these 10 predictions, we tested a different hypothesis with which we made 10 other predictions divided into 5 
successes and 5 failures, the uncertainty of our result changes. Indeed, in this case, we must calculate the probability 
of obtaining a result with a number of successes greater than or equal to 8 by performing two random attempts 
consisting of 10 predictions each. In this case, the probability becomes 10.6%, so the fact of having first tested a 
random hypothesis almost doubled the uncertainty of our second hypothesis. Consequently, increasing the random 
hypotheses increases the number of predictions that we will have to make, with the true hypothesis, to have an 
acceptable uncertainty. 


Second Fundamental Consequence of the Non-independence of the Statistical Data: every random action of ours and 
of every other person equivalent to us, always involves an increase in the uncertainty of the statistical data. 


By the equivalent term, we mean a person with the same reputation as us, therefore the data produced by 
equivalent people are judged with the same weight. 


Example: 10 people participate in a project whose goal is the development of an algorithm capable of predicting the 
outcome of an event that can have only two results. An external person who does not participate in the project but is 
aware of every attempt made by the participants evaluates the statistical data obtained. All participants make 100 
predictions, 9 get a 50% chance of success, one gets a 65% chance of success. The uncertainty of the static data of 
the participant who obtains a probability of success of 65% is obtained by calculating the probability of obtaining a 
result with a number of successes greater than or equal to 65 by performing ten random attempts consisting of 100 
predictions each. The probability obtained, in this way, is 16% instead if he was the only participant in the project the 
probability would have been 0.18%, therefore about 100 times lower. 
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Third fundamental consequence of the non-independence of the statistical data: the calculation of the uncertainty 
varies according to the information possessed. 


Example: 10 people participate in a project whose goal is the development of an algorithm capable of predicting the 
outcome of an event that can have only two results. In this case, people do not know the other participants and 
think they are the only ones participating in the project. All participants make 100 predictions, 9 get a 50% chance 
of success and one gets a 65% chance of success. The participant who obtains a probability of success of 65% 
independently calculates the uncertainty of the result obtained. Not knowing that other people are participating in 
the project, calculate the probability of obtaining a result with a number of successes greater than or equal to 65 by 
performing a single random attempt consisting of 100 predictions; the probability obtained is 0.18%. An external 
person who is aware of every attempt made by the participants calculates the uncertainty of the participant’s 
statistical data, which obtains a probability of success of 65%. It then calculates the probability of obtaining a 
result with a number of successes greater than or equal to 65 by making ten random attempts consisting of 100 
predictions each. The probability obtained, in this way, is 16%, a much higher value than the uncertainty calculated 
by the participant. The uncertainty value calculated by the external person using more information is most accurate 
than the uncertainty value calculated by the individual participant. Consequently, the uncertainty value obtained 
by exploiting the greatest number of information must always be considered, in the case of the example, the most 
accurate uncertainty is that of 16%. 


The first and second fundamental highlighting consequence of the non-independence of the statistical data can 
be redefined by highlighting the non-randomness of the action. 


First Fundamental Consequence of the Non-independence of the Statistical Data: Our every non-random action 
always involves a decrease in the uncertainty of the statistical data. 


Second Fundamental Consequence of the Non-independence of the Statistical Data: Every non-random action of ours 
and of every other person equivalent to us, always involves a decrease in the uncertainty of the statistical data. 


5. Conclusion 


The traditional definition of uncertainty implies considering true, for non-homogeneous data dispersions, the 
hypothesis result is not completely random. We consider this assumption as the main problem of the definition of 
uncertainty. Indeed, whatever the statistical data obtained, there is always a possibility that they are completely 
random and therefore useless. 


This error stems from the fact that the definition of uncertainty was developed in an environment where each 
method had a strong deterministic component. Therefore, calculating the probability of obtaining an equal or better 
result at random might seem useless. However, when we apply statistics in fields such as finance, where the random 
component is predominant the traditional approach to uncertainty turns out to be unsuccessful. It fails for the simple 
reason that the hypothesis on which it is based may not be true. For this reason, we have defined the uncertainty of 
the statistical data as the probability of obtaining an equal or better result randomly. Since this definition of uncertainty 
is not linked to any hypothesis, it turns out to be universal. The correct calculation of this probability value implies 
considering the statistical data dependent on each other. This assumption, as we have shown through a paradox, 
makes the definition of uncertainty given consistent with the logical principle that it is not possible to create knowledge 
randomly. 

The non-independence of the statistical data implies that each action performed has an effect on the 
calculation of uncertainty. The interesting aspect is that a dependence is also created between actions performed 
by different people. Consequently, the calculation of uncertainty depends on the information in our possession, 
so it becomes something relative that can be determined absolutely only with complete knowledge of the 
information. 
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